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When applied to the same game, probability theory and game theory can disagree on calculated 
values of the Fisher information, the log likelihood function, entropy gradients, the rank and Jaco- 
bian of variable transforms, and even the dimensionality and volume of the underlying probability 
parameter spaces. These differences arise as probability theory employs structure preserving isomor- 
phic mappings when constructing strategy spaces to analyze games. In contrast, game theory uses 
weaker mappings which change some of the properties of the underlying probability distributions 
within the mixed strategy space. In this paper, we explore how using strong isomorphic mappings 
to define game strategy spaces can alter rational outcomes in simple games, and might resolve some 
of the paradoxes of game theory. 
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I. INTRODUCTION 



One possibly fruitful way to gain insight into the para- 
doxes of game theory is to show that probability the- 
ory and game theory analyze simple games differently. 
It would be expected of course that these two well de- 
veloped fields should always produce consistent results. 
However, we will show in this paper that probability the- 
ory and game theory can produce contradictory results 
when applied to even simple games. These differences 
arise as these two fields construct mixed and behavioural 
strategy spaces differently. 

The mixed strategy space of game theory is con- 
structed, according to von Neumann and Morgenstern 
[l|, by first making a listing of every possible combina- 
tion of moves that players might make and of all possi- 
ble information states that players might possess. This 
complete embodiment of information then allows every 
move combination to be mapped into a probability sim- 
plex whereby each player's mixed strategy probability 
parameters belong to "disjoint but exhaustive alterna- 
tives, . . . subject to the [usual normalization] conditions 
...and to no others." [l|. The resulting unconstrained 
mixed strategy space is then a "complete set" of all pos- 
sible probability distributions that might describe the 
moves of a game [U-Q- Further, the absence of any con- 
straints other than for normalization ensures "trembles" 
or "fluctuations" are always present within the mixed 
strategy space so every possible pure strategy probabil- 
ity distribution is played with non-zero (but possibly in- 
finitesimal) probability @. Together, these properties of 
the mixed strategy space — a complete set of "contained" 
probability distributions, no additional constraints, and 
ever present trembles — lead to inconsistencies with prob- 
ability theory. 

In constructing a mixed strategy space, probability 
theory first examines how subsidiary probability distri- 
butions can be "contained" within a mixed space and 
whether the properties of the probability distributions 
are altered as a result. Probability theory uses isomor- 
phisms to implement mappings of one probability space 
into another space. An isomorphism is a structure pre- 
serving mapping from one space to another space. In 
abstract algebra for instance, an isomorphism between 
vector spaces is a bijective (one-to-one and onto) linear 
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mapping between the spaces with the implication that 
two vector spaces are isomorphic if and only if their di- 
mensionality is identical [7|. When the preservation of 
structure is exact, then calculations within either space 
must give identical results. Conversely, if the degree of 
structure preservation is less than exact, then differences 
can arise between calculations performed in each space. 
It is thus crucial to examine the fidelity of the "contain- 
ment" mappings used to construct the mixed spaces of 
game theory. 

Probability theory defines isomorphic probability 
spaces as follows. First, a probability space V = 
{57, a, P} consists of a set of events 57, a sigma-algebra 
of all subsets of those events a, and a probability mea- 
sure defined over the events P. Two probability spaces 
V = {57, a, P} and V = {57', a', P'} are said to be strictly 
isomorphic if there is a bijective map / : 57 — > 57' which 
exactly preserves assigned probabilities, so for all e G 57 
we have P(e) = P'[f(e)]. A slight weakening of this def- 
inition defines an isomorphism as a bijective mapping / 
of some unit probability subset of 51 onto a unit probabil- 
ity subset of 57'. That is, the weakened mapping ignores 
null event subsets of zero probability. This definition and 
equivalent ones are given in Refs. [8MTo| . In particular, 
we note that strong isomorphisms between source and 
target probability spaces require they have identical di- 
mensionality and tangent spaces [TT1 ]. 

The mixed strategy space of game theory "contains" 
different probability distributions with many possessing 
different dimensionality (according to probability the- 
ory). Their altered dimensionality within the mixed 
space can alter those computed outcomes dependent on 
dimensionality. A simple functional illustration of this 
process can make this clear. A 1-dimensional function 
f(x) can be embedded within a 2-dimensional function 
g(x,y) in two ways: using constraints g(x,yo) = f(x), or 
limits linij,_j. ao g(x, y) = f{x). In either case, many of the 
properties of the source function f(x) are preserved, but 
not necessarily all of them. In particular, these different 
methods alter gradient optimization calculations. That 
is, the gradient is properly calculated when constraints 
are used, f'(x) = g'{x,yo), but not when a limit process 
is used, f'(x) ^ limj,_j.j, Vg{x, y) (where V indicates a 
gradient operator). 

In this paper, we will show that exactly the same dis- 
crepancies arise when probability theory and game theory 
arc applied to simple probability spaces, and that these 
discrepancies can be significant. It is useful to indicate 
the magnitude of these discrepancies here to motivate the 
paper (with full details given in later sections below) . We 
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consider a simple card game with two potentially corre- 
lated variables x,y G {0,1} with joint probability dis- 
tribution P xy . In the case where x and y are perfectly 
correlated, probability theory (denoted by P) and game 
theory (denoted by G) respectively assign different di- 
mensions to both the Fisher information matrix (F) and 
the gradient of the log Likelihood function (VL), and can 
disagree on the value of the gradient of the joint entropy 
at some points (yE xy ): 
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These helds also disagree on the probability space gradi- 
ents of both the normalization condition (Poo + Pn = 1) 
and the requirement that the joint entropy equates to the 
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Should these fields model a change of variable within this 
game, they further disagree on the rank of the transform 
matrix {A), and on the invertibility of the Jacobian ma- 
trix (J): 
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These fields even disagree on the dimension (d) and vol- 
ume (V) of the minimal probability space used to analyze 
the game: 
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The differences between game theory and probability the- 
ory arise due to the different use of isomorphic mappings 
to construct mixed strategy spaces. 

In Section |TI] we show the necessity for considering iso- 
morphic probability spaces using examples ranging from 
simple dice games to bivariate normal distributions. Sec- 
tion Mil collects results for the mixed and behavioural 
strategy spaces of a simple two-stage game and again 
establishes the necessity for taking account of isomor- 
phic probability distributions. We apply these results in 
Section llVI to optimizing highly nonlinear random func- 
tions over a decision tree involving correlated variables. 
This section is then generalized and applied to a strategic 
game in Section [V] Throughout, we place the details of 
many calculations within the Appendices to show work- 
ing and avoid cluttering the paper. 



II. OPTIMIZATION AND ISOMORPHIC 
PROBABILITY SPACES 

In this section, we introduce the need to use iso- 
morphic mappings when embedding probability spaces 
within mixed spaces. 
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FIG. 1: Three alternate dice with different numbers of sides. 
A coin with sides A and B appearing with respective proba- 
bilities a and b, a triangle with faces A, B and C occurring 
with respective probabilities a, b and c, and a square die with 
faces A,B,C and D each occurring with respective probabili- 
ties a, b, c and d. 



A. Isomorphic dice 

Consider the three alternate dice shown in Fig. [T] rep- 
resenting a 2-sided coin, a 3-sided triangle, and a 4-sided 
square. Faces are labeled with capital letters and the 
probabilities of each face appearing are labeled with the 
corresponding small letter. The corresponding probabil- 
ity spaces defined by these die are 

'Pcoin — 

{xe{A,B},{a,b}} 
^triangle = {x € {A, B, C}, {a, b, c}} 
^square {x G {A, B, C, D}, {a, b, c, d}}. (5) 

Here the required sigma-algebras are not listed, and each 
of these spaces are subject to the usual normalization 
conditions. For notational convenience we sometimes 
write (j>i,p2,P3,P4) = (a,b,c,d) and denote the number 
of sides of each respective die as n £ {2,3,4}. 

We now wish to optimize a nonlinear function over 
these spaces, and we choose a function which cannot 
be optimized using standard approaches in game theory. 
The chosen function is 



with 



F = V Z E X 



V = I dv 

i space 



(6) 



E,, 



(7) 



where V is the volume of each respective probability pa- 
rameter space and E x is the marginal entropy of each 
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space [l2j. We will complete this optimization in three 
different ways, two of which will be consistent with each 
other and inconsistent with the third. 

As a first pass at optimizing the function F, we simply 
maximize F within each probability space and then com- 
pare the optimal outcomes to determine the best achiev- 
able outcome. As is well understood, the entropy of a set 
of n events is maximized when those events are equiprob- 
able giving a maximum entropy of E x ^ meix = logn. Using 
the volume results of Eqs. IA2I — IA4l the function F takes 
maximum values in the three probability spaces of 



F, 



coin, max 



F 



triangle, max 



F 



square, max 



log 2 
log 3 
4 

log 4 
~36~' 



(8) 



Comparing these outcomes makes it clear that the best 
that can be achieved is to use a coin with equiprobable 
faces. 

The second method uses isomorphisms to map all of 
the three incommensurate source spaces into a single tar- 
get space. We choose our mappings as follows: 



V ■ 

' com 

V 1 

1 triangle 

V 

' square 



{x e {A, B, C, D}, {a, b, c,d}}\ {cd)=m 
{x G {A,B,C,D},{a,b,c,d}}\ d ^ Q 
{x e {A,B,C,D},{a,b,c,d}}. 



(9) 



Here, while all probability spaces share a common event 
set and probability distribution, the isomorphic map- 
pings impose constraints on the T^in an d ^triangle spaces. 
The constraints arise from mapping the null sets of zero 
probability from each source space to the corresponding 
events of the enlarged target space. The target proba- 
bility space is shown in Fig. [5] where the normalization 
condition d = 1 — a — 6 — c is used. The points cor- 
responding to the probability spaces of the coin V coin 
are mapped along the line a + b = 1 with constraint 
(c, d) = (0,0). Those points corresponding to the prob- 
ability spaces of the triangle 'Pt riangle are mapped along 
the surface a + b + c = 1 with constraint d = 0. Fi- 
nally, the probability spaces corresponding to the square 
^square nn the volume a + b+c+d = 1 and are not subject 
to any other constraint. 

The interesting point about the target space is that 
many points, e.g. (a, b, c,d) = (^, ^, 0, 0), lie in all of the 
probability spaces of the coin, triangle, and square die 
and are only distinguished by which constraints are act- 
ing. That is, when this point is subject to the constraint 
(c,d) = (0,0), then it corresponds to the probability 
space P' coin (and not to any other) . Conversely, when this 
same point is subject to an imposed constraint d — then 
it corresponds to the probability space le . Finally, 

when no constraints apply then, and only then does this 
point correspond to the probability space of the square 
^square- This means that it is not the probability values 
possessed by a point which determines its corresponding 
probability space but the probability values in combina- 
tion with the constraints acting at that point. 

It is now straightforward to use the isomorphically con- 
strained target space to maximize the function F over all 
embedded probability spaces using standard constrained 
optimization techniques. For instance, to optimize F 




FIG. 2: The target space containing points corresponding to 
the probability spaces respectively of the coin V' coin along the 
line a + b = 1 with constraint (c,d) = (0,0) (heavy line), 
of the triangle Triangle along the surface a + b + c — 1 with 
constraint d — (hashed surface), and of the square ^square 
filling the volume a + b + c + d — 1 (filled polygon). Note 
that points such as (a, b, c) = (0.5, 0.5, 0) correspond to all 
three probability spaces and are only distinguished by which 
constraints are acting. 



over points corresponding to the coin and subject to 
the constraint (c,d) — (0,0) then either simply resolve 
the constraint via setting c = d = before the opti- 
mization begins, or simply evaluate the gradient of F 
at all points (a, b, 0, 0) in the direction of the unit vec- 
tor 4j(— 1,1,0,0) lying along the line a + b = 1. (See 
Eq. IA6H An optimization over all three isomorphic con- 
straints leads to the same outcomes as obtained previ- 
ously in Eq. [5] This completes the second optimization 
analysis and as promised, it is consistent with the results 
of the first. 

The same is not true of the third optimization ap- 
proach which produces results inconsistent with the first 
two. The reason we present this method is that it is 
in common use in game theory. The third optimization 
method commences by noting that the probability space 
of the square is complete in that it already "contains" all 
of probability spaces of the triangle and of the coin. This 
allows a square probability space to mimic a coin proba- 
bility space by simply taking the limit (c, d) — > (0,0). 
Similarly, the square mimics the triangle through the 
limit d — > 0. In turn, this means that an optimization 
over the space of the square is effectively an optimization 
over every choice of space within the square. Specifically, 
game theory discards constraints to model the choice be- 
tween contained probability spaces. This optimization 
over the points of the square has already been completed 
above. When optimizing the function F over the uncon- 
strained points corresponding to the square, the maxi- 
mum value is F = log(4)/36 at (a, b, c, d) = (j, j, i, j), 
and according to game theory, this is the best outcome 
when players have a choice between the coin, the triangle, 
or the square. 

The optimum result obtained by the third optimization 
method, that used by game theory, conflicts with those 
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found by the previous two methods as commonly used in 
probability theory. The difference arises as game theory 
models a choice between probability spaces by making 
players uncertain about the values of their probability 
parameters within any probability space. Consequently, 
their probability parameters are always subject to in- 
finitesimal fluctuations, i.e. c > + or d > + always. 
These fluctuations alter the dimensions of the space 
which impacts on the calculation of the volume V and 
alters the calculated gradient of the entropy. Game the- 
ory eschews the role of isomorphism constraints within 
probability spaces on the grounds that any such con- 
straints restrict player uncertainty and hence their ability 
to choose between different probability spaces. The prob- 
ability parameter fluctuations mean that players have ac- 
cess to all possible probability dimensions at all times so 
a single mixed space is the appropriate way to model the 
choice between contained probability spaces. In contrast, 
probability theory holds that the choice between proba- 
bility spaces introduces player uncertainty about which 
space to use, but specifically does not introduce uncer- 
tainty into the parameters within any individual prob- 
ability space. As a result, probability theory employs 
isomorphic constraints to ensure that the properties of 
each embedded probability space within the mixed space 
are unchanged. 

The upshot is that a game theorist cannot evaluate 
the Entropy (or uncertainty) gradient of a coin toss 
while considering alternate die because uncertainty about 
which dice is used bleeds into the Entropy calculation. 
However, the probability theorist will distinguish be- 
tween their uncertainty about which face of the coin will 
appear and their uncertainty about which dice is being 
used. 



B. Continuous bivariate Normal spaces 



an external constraint that p = in the enlarged space. 
Hence, we expect P xy \ _ — Pxy It is readily confirmed 
that when the isomorphism constraint is imposed on the 
enlarged distribution all properties are preserved, while 
this is not the case in the absence of the constraint. The 
probability distributions must satisfy a number of gradi- 
ent relations (with the gradient operator V a function of 
seven variables), for instance 







v\p' -p'p'M 

L xy 1 x 1 yj \ p=0 

lim V[P' xy -P' x P y ) + 



pi _ pi 

x\y x 



p=0 



lim V 



pi _ p' 

A x\y x X 



o. 



(10) 



(See Eq. IA14H Similarly, the expectations of functions 
of the x and y variables must also satisfy a number of 
gradient relations (with the gradient operator V now a 
function of five variables), for instance 



V[(xyy-(xY(y)X =0 = 
liva V[{xyy-{xY{yy} ± 0. 



(11) 



(See Eq. IXI61 1 





(10), c 




(11), d 




(01), b 




(00), a 





The above results are general. When source probabil- 
ity spaces are embedded within target probability spaces, 
then the use of isomorphic mapping constraints will pre- 
serve all properties of the embedded spaces. Conversely, 
when constraints are not used then some of the properties 
of the embedded spaces will not be preserved in general. 
We illustrate this now using normally distributed contin- 
uous random variables. 

Consider two normally distributed continuous indepen- 
dent random variables x and y with x, y £ (— oo,oo). 
When independent, these variables have a joint proba- 
bility distribution P xy which is continuous and differen- 
tiable in six variables, P xy (x, p x ,a x ,y, p, y ,a y ) where the 
respective means are fj, x and jj, y and the variances are a x 
and a y . The marginal distributions are P x (x, p, x ,a x ) and 
P y (y,p y ,<r y ). (See Eq. [XJ) 

The independent joint distribution P xy can now be em- 
bedded into an enlarged distribution representing two po- 
tentially correlated normally distributed variables x and 
y. This enlarged distribution P xy (x, jj, x ,a x ,y, fi y ,a y , p) 
differs from P xy in its dependence on the correlation pa- 
rameter p xy = p with p G (—1,1). This distribution 
is continuous and differentiable in seven variables. (See 
Eq. IA91 ) An isomorphic embedding requires that the 
unit probability subset of P xy be mapped onto the unit 
probability subset of P' xy and this is achieved by imposing 



FIG. 3: A four-sided square probability space where joint vari- 
ables x and y take values (x,y) £ {(0, 0), (0, 1), (1, 0), (1, 1)} 
with respective probabilities (a,b,c,d). 



C. Joint probability space optimization 

We will briefly now examine isomorphisms between the 
joint probability spaces of two arbitrarily correlated ran- 
dom variables. In particular, we consider two random 
variables x 7 y as appear on the square dice of Fig. |3]with 
probability space 



square = {(x, y) 6 {(0, 0), (0, 1), (1, 0), (1, 1)}, 



{a,b,c,d}}. (12) 
The correlation between the x and y variables is 

(xy) - (x) (y) 



Pxy = 



'x<->y 



ad — be 



^{c + d){a + b)(b + d)(a + c) ' 



(13) 



Here, a x and a y are the respective standard deviations 
of the x and y variables. 
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The space P S quare of course contains many embedded 
or contained spaces. We will separately consider the case 
where x and y are perfectly correlated, and where they 
are independent. As noted previously, there are two dis- 
tinct ways for these spaces to be contained within "P S quare, 
namely using isomorphism constraints or using limit pro- 
cesses. These two ways give the respective definitions for 
the perfectly correlated case 

T'corr = { (x, y)e {(0,0), (0,1), (1,0), (1,1)}, 

{a,b,c, 4}| 6=c=0 
V' COIT = Jim {(x,y)e {(0,0), (0,1), (1,0), (1,1)}, 

(bc)-> (00) 

{a,b,c,d}} (14) 

and for the independent case 

V ind = { (a, y)e {(0,0), (0,1), (1,0), (1,1)}, 
{a,b,c,d}}\ ad=bc 
= lim {(x,y) €{(0,0), (0,1), (1,0), (1,1)}, 



id—tbc 



{a,b,c, d}}. 



(15) 



Here, all spaces satisfy the normalization constraint 
a + b + c + d = 1, which we typically resolve using 
d = 1 — a — b — c. Evaluating any function dependent 
on a gradient or completing an optimization task using 
either isomorphic constraints or limit processes can nat- 
urally result in different outcomes as we now illustrate. 



1. Perfectly correlated probability spaces 

We first consider the case where the x and y variables 
are perfectly correlated in the spaces "P C orr with isomor- 
phism constraints or V' corr using limit processes. 

The maximum achievable joint entropy fl2l ] for our 
two perfectly correlated variables obviously occurs at the 
point where they are equiprobable. This can be found by 
evaluating the gradient of the joint entropy function 



E X y(a, b, C) = - ^ P X y lOg P X y. 



(16) 



■nj 



In the space "P C orr, the gradient optimization 
V E xy \b =c= Q = locates an optimum point at 
(a,b,c,d) = (1,0,0,1), while in the space V' COTX 
the optimum at S7E xy — locates the point 
(a,6,c,rf) = (|,i,i,i). (See Eq. HH) 

The Fisher Information is defined in terms of probabil- 
ity space gradients as the amount of information obtained 
about a probability parameter from observing any event 
[lS^ . It is a matrix Fij with elements i,j g {1,2,3}. 
In the isomorphically constrained space V c 
Information is a scalar via 



the Fisher 



-Fii 



1 



<z(l 



(17) 



equal to the inverse of the Variance as required. A very 
different result is obtained in the unconstrained space 
V' COTI where the Fisher Information is a much larger ma- 
trix. (See Eq. lAUn 

Probability parameter gradients also allow estimation 
of probability parameters by locating points where the 



Log Likelihood function is maximized V log L — [12j . 
This evaluation takes very different forms in the isomor- 
phically constrained space V COI1 - and the unconstrained 
space V COIT as shown in Eq. IA24I Coincidentally how- 
ever, in our case the same estimated outcomes can be 
achieved in both spaces. For example, if an observation 
of n trials shows n a instances of (x, y) = (0, 0) and n — n a 
instances of (x, y) = (1,1) then both constrained and 
unconstrained approaches give the best estimates of the 
probability parameters of (a, b, c, d) = (^f, 0, 0,1 — ^). 

Finally, when x and y are perfectly correlated it is 
necessarily the case that expectations satisfy (x) — (y) = 
0, that variances satisfy V(x) — V{y) — 0, that the joint 
entropy is equal to the entropy of each variable so E xy — 
E x = 0, and that finally, the correlation between these 
variables satisfies p xy — 1 = 0. All of these properties lead 
to gradient relations in the "Pcoit and Pcorr spaces of: 

V [(x) - (y)} | b=c=0 = 

lim V[(x) - (y)] = -b + c 
(6c)-K00) 

V [V(x) - V{y)] |fc =c=0 = 

lim V \V(x) - V(y)} = (1 - 2a)b - (1 - 2a)c 
(ftc)-Koo) 

V [E xy — E x ] \b= c =o = 

lim V \E XU — E x ] 7^ undefined 

(6c)-K00) 



Vp xy \b =c =o — 

V 1 p X y ^ 0. 



(18) 



Obviously, taking the limit (b, c) — > (0, 0) does not reduce 
the limit equations to the required relations. (See Eq. 

E2S) 



2. Independent probability spaces 

We next consider the case where the x and y variables 
are independent using the spaces Vi n d with isomorphism 
constraints or V' ind with limit processes. 

When random variables are independent, then their 
joint probability distribution is separable for every al- 
lowable probability parameter of "Pmd or V ind . This 
means the gradient of this separability property must 
be invariant across both probability spaces. That is, 
we must have both P xy = P x P y everywhere and hence 
V [P xy — PxPy] = 0. Similarly, separability requires we 
also satisfy V [(xy) — (x)(y)] = 0. Further, every inde- 
pendent space must have conditional probabilities equal 
to marginal probabilities and so satisfy V [P x \ y — P x ] = 
0. Finally, two independent variables have joint entropy 
equal to the sum of the individual entropies so every in- 
dependent space must satisfy V [E xy — E x — E y ] = 0. 
These relations evaluate differently in either Vind with 
isomorphism constraints or Pl nd with limit processes. We 
have: 



V[P X y (00) -P X (0)Py(0)}\ ad— be 

V [(xy) 
lim \ 

ad^bc 

V [P B | tf (0|0) - P x (0)] \ad=bc 



= 



lim V[P xy (00)-P x (0)Py(0)} 

d—tbc 

X){y)] \ad=bc 

lim V [(xy) - (x)(y)} 

ad—tbc 



lim V{ad-bc) ^ 

ad— >bc 



lim V(ad-6c) ^ 

ad— vbc 



FIG. 4: A schematic representation where a three dimensional 
target probability strategy space (p, q, r) embeds respectively 
several one dimensional probability spaces associated with per- 
fectly correlated variables (lines, upper left and lower right), 
and a two dimensional probability space associated with in- 
dependent variables (plane, middle). An exact isomorphism 
preserves the respective original tangent spaces shown via one 
and two dimensional axes offset in background. A weak iso- 
morphism fails to preserve the original tangent spaces of the 
source probability distributions and assigns the three dimen- 
sional tangent space of the target space to every embedded 
distribution ( as shown in foreground slightly offset from each 
embedded space). 



lim V [P x]y (0\0) - P x (0)] = lim V ■ f 

ad-^,bc ad^bc [ a + C J 

V [E xy — E x — E y ] \ad=bc = 

lim V[E xy -E x -E y ] ^ 0. (19) 

ad^-bc 

(See Eqs. IJ27ltoE29l) 

D. Discussion 

There are two approaches to optimization over prob- 
ability spaces presented here. Probability theory uses 
isomorphic constraints to exactly preserve the proper- 
ties of embedded probability spaces and then compares 
these exactly calculated values. Game theory eschews the 
use of isomorphic constraints and in effect, argues that 
any uncertainty about which probability space to choose 
bleeds into many calculations within a given space and 
alters the calculated outcomes. 

When probability spaces are represented as geometries, 
then it is expected that at least some of the properties 
of the probability space will be rendered in geometric 
terms. How these geometrical properties are preserved 
when a probability space is embedded within another 
is the question. Probability theory requires the exact 
preservation of all properties of every source space and 
this is achieved by imposing different constraints on dif- 
ferent points within the target space. Game theory in 
contrast, imposes a single target space geometry onto 
every source probability space. One way to picture this 



FIG. 5: Every point within the (p, q, r) probability space shown 
specifies a particular state of correlation p xy (p,q,r) between 
the x and y variables. We show here several lines and sur- 
faces of constant correlation taking values from top left to bot- 
tom right of p xy = +1, +0.75, +0.25, 0, -0.25, -0.75, -1. The 
optimization of expectations at any point (p, q, r) must take 
account of correlated changes between x and y. 



is shown in Fig. [¥] This figure shows how probability 
theory exactly preserves the dimensionality and tangent 
spaces of embedded probability spaces, while game the- 
ory overwrites these properties of the embedded spaces 
with the corresponding properties of the mixed space. 

In probability theory, the different isomorphism con- 
straints and tangent spaces acting at each point de- 
fine non-intersecting lines and surfaces within the target 
space. Some of these are shown in Fig. [5] representing 
the (p, q, r) simplex of the two potentially correlated x 
and y variables (this behavioural space is defined in the 
next section). Here, each state of correlation is a con- 
stant and cannot vary during an optimization analysis 
so an optimization procedure must sequentially take ac- 
count of every possible correlation state between these 
variables, setting p xy — p for all p G [—1,1]. These op- 
timum points can then be compared to determine which 
correlation state between x and y returns the best value. 

Unsurprisingly, these two distinct approaches can 
sometimes generate conflicting results. 



III. MIXED AND BEHAVIOURAL STRATEGY 
SPACES 

The different approaches of probability theory and 
game theory to isomorphic embeddings also impacts on 
the definitions of mixed and behavioural strategy spaces. 
As usual, we will compare these spaces both with and 
without isomorphism constraints. Our focus will be on a 
simple decision problem involving two random variables 
x,y G {0, 1} where y is potentially conditioned on x as 
shown in the behavioural strategy decision tree of Fig. [6] 
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En, 



y: 



FIG. 6: A simple decision tree where potentially independent 
or correlated variables x andy take values {0, 1} with the prob- 
abilities shown. This defines the (p, q, r) behavioural probabil- 
ity space. 



A. Mixed strategy space Vm 

The mixed strategy space is denoted Vm-, and deter- 
mines the choice of x via a probability distribution a 
while the respective choices of y on the left branch of the 
decision tree y\ and on the right branch y r are determined 
by an independent probability distribution /3 according 
to the following table: 



(yi,yr) = 


(0,0) 


(0,1) 


(1,0) 


(1,1) 


(x,y) 


ft 


ft 


ft 


ft 


a 


(0,0) 


(0,0) 


(0,1) 


(0,1) 




(1,0) 


(1,1) 


(1,0) 


(1,1). 



(20) 



The mixed strategy simplex for each player is respec- 
tively S x = {(a ,ai) G R\ : J2j a j = 1} and S Y = 
{(ft, /3 U fo, jSg) G i?| : Ejft" = !}■ The associated 
tangent spaces are T x = {z £ R 2 : J2j z j — 0} and 
T Y = {z G i? 4 : ^ ■ Zj = 0}, equivalent to every possible 
positive or negative fluctuation in the probabilities of the 
the pure strategies of each player. The joint probability 
distribution P xy (x,y) for x and y is 

Px»(0,0) = (1 - ai)(l - ft - ft) 

Pxy (0,1) = (l- ai )(ft + ft) 

P xy (l,0) = ai(l-ft-ft) 

Px»(l,l) = Mft+ft)- (21) 

Here, we have used normalization constraints to elimi- 
nate ao and ft. The expectations of the x and y variables 
are given by 



\x) = ax 

(y) = ft + ft + <*i(ft-ft) 

(xy) = a x (ft+ft), 



(22) 



while their variances are 



V(x) = ai(l — ax) 

V(y) = [ft+ft + ar(ft-ft)] x 

x [1- ft-ft-ar (ft -ft)]. (23) 

For completeness, we note the marginal and joint en- 
tropies are 



E 



= -[l-^-ft + c^ft-ft)] x 

log[l-ft-ft + ai(ft-ft)] 
-\j3 2 + - ax(p 2 - px)] x 

log[ft +ft-«i(ft-ft)] 
= -(1 - ax)(l - ft - ft) log[(l - a x )(l - ft - ft)] 
-(1 - «i)(ft + ft) log[(l - a x )(ft + ft)] 
-a x (l - ft - ft) log[a! (1 - ft - ft)] 
-ax(Px +ft)log[ax(ft +ft)]. (24) 



Naturally, the mixed strategy probability space can 
model any state of correlation between x and y with the 
correlation give by 



Pxy(a!l,ft,ft,ft) 



Vaiq-aiX/gi-flQ 



(25) 



Then, when x and y are perfectly correlated we have 
p xy = 1 requiring the constraints /?i = 1 and ft = ft = 
/?3 = 0. When x and y are perfectly anti-correlated we 
have p xy — — 1 requiring the constraints ft = 1 and 
ft = ft = ft = 0. Finally, when x and y are independent 
we have p xy = requiring the constraint fix — ft. 



B. Behavioural strategy space 

The behavioural strategy probability space [4| is de- 
noted Vb and is parameterized as shown in Fig. [6] 
The behavioural strategy space for the players is S XY = 
{(P, 9,0 G P+ : < p, q, r < 1} after taking ac- 
count of normalization. The associated tangent space 



XY 



{z G R 3 }. The probability P xy (x,y) that 



is T 

and y take on their respective values is 

P^(0,0) = 
P^(0,1) = (l-p)q 

P XJ/ (1,0) - p(l-r) 
P^a.l) = pr. 



(26) 



This distribution gives the following expected values: 



{%) = P 

(y) = q + p(r-q) 

(xy) = pr, 

while the variances of the x and y variables are 



(27) 



E , 



-(1 - ax) log(l - a x ) - ax logai 



V(x) = p(l-p) 

V(y) = [q+p(r-q)][l-q-p(r-q)}. (28) 

The marginal and joint entropies between the x and y 
variables are 

E x = -(1 -p)log(l-p) -plogp 

Ey = _[(l_p)(l_ g )+p(l_ r )] X 

log[(l-p)(l-g)+p(l-r)] 
-[(1 -p)q + pr] log[(l -p)q+pr] 
E xy = -(l-p)(l-g)log[(l-p)(l-«)] 
-(1 -p)qlog[(l -p)g] 
-p(l - r) log[p(l - r)] 
—pr\og[pr\. (29) 
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Pxy = 1 T^B VM\p 1=1 ^B|( g ,r) = (0,l) 

Parameters ai, fa, fa, fa P,Q,r ai p 

Dimensions 4 3 11 

V operator ^&i + ^-Ji + ^fa + ^fa £p + + £f £-&i £p 

Gradient lim^-n V(.) lim( 9!r .)->(o,i) V(.) V V 
Probability Conservation 

V[P xy (0,0) + P xy (l,l)] aifa - (1 - an) fa + (2ai - 1) fa -(l-p)q+pf 

V[P^(0,1) + P^(1,0)] -qi/3i + (1 - ai)/3 2 - (2qi - {\-p)q-pf 
Conditionals 

VP,|,(0|0) ^.0 1+ fa) o 

VP,, y (o|i) i^±Cfa + fa) o o 

Expectations 

V(x) Qi p di p 

V(y) on + aifa + (1 - an) fa + fa p + (l-p)q + pr &i p 

V{xy) &i + aifa + aifa p + pf on p 

Variance 

V[V(x) + V(y)-2cov(x,y)] -aifa + (1 - an) fa + (1 - 2on)fa (l-p)g-pf 
Entropy 

V [E xy -E x ] + / 
Correlation 

vp xy + o / o o o 



p^y = o 




Vm 






Ps 






V B \ r=q 


Parameters 


ai 


,fa,fa,P 


3 




p,q,r 




ai,p = fa+fa 


p,q 


Dimensions 




4 






3 




2 


2 


V operator 


8ai U1 T 8/31 




& + afjA 




8r' 




i;p + -§- q q 


Gradient 


lim^-^i V( 


•) 


lim T ._ > .q V( 


■) 


V 


V 


Probability 


















V[P xy (0,0)-P x (0)P y (0)] 


ai(l - 


-ai)(fa- 


-fa) 


p(l- 


-p)(f- 


?) 








V[P xy (Q,l)-P x (Q)P y (l)} 


Ql(l ~ 


-ai){fa- 


-fa) 


p(l- 


-P)(9- 


■f) 








V [P xy (l,Q) - P x (l)P y (0)] 


ai(l - 


-ai)(fa- 


-fa) 


p(l- 


-P)(9- 


r) 








V[P xy (l,l)-P x (l)P y (l)} 


ai(l - 


-ai)(fa - 


-fa) 


p(l- 


-p)(f - 


■q) 








Conditionals 


















V[P B | B (0|0)-P B (0)] 


«i(i- 
l-/3i 




fa) 


P (i- 
(i- 


^ (f 


q) 








V[P B | V (0|1)-P B (0)] 


ai(l-ai) ^5 

/3l+/3 3 


fa) 


P (i- 

9 


--*(q- 


f) 








Expectation 


















V[<av> -<*><!/>] 


ai(l - 


-ai)(fa - 


-fa) 


p(l- 


-p)(f - 


■q) 








Entropy 


















V — E x — Ey\ 




















Correlation 


















Vp xy 





















TABLE I: A comparison of calculated results for mixed Vm and behavioural Vb strategy spaces with those same spaces when 
subject to isomorphic constraints. We examine points where respectively the x and y variables are first perfectly correlated 
with p xy — 1 and then independent with p xy = 0. In the unconstrained behavioural spaces, all quantities are evaluated at 
points satisfying lim^^i or lirri( 9!r .)->(o,i) when p xy — 1, and at points satisfying lim^-^ or lim r ^ 9 when p xy = 0. The 
isomorphically constrained spaces are respectively indicated by Vm\^ 1=1 and PbIu r \ = r ^ for the perfectly correlated case, and 
"Pm\p 1= p^ and T'B\ r=q when the variables are independent. Game theory and probability theory assign different dimensionality 
and tangent spaces to these cases. Many calculated results differ between these spaces. 
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The behavioural probability space also allows modeling 
any arbitrary state of correlation between the x and y 
variables where the correlation between x and y is 



Pxy 



y/pO--p)(r-q) 



y/[q + p(r -q)][l-q- p(r - q)} 



(30) 



Then, x and y are perfectly correlated at p xy (p, 0, 1) = 1, 
perfectly anti-correlated at p xy (p, 1,0) = — 1, and uncor- 
related if either p — or p = 1 or q = r giving p xy = 0. 
Hence, the decision tree of Fig. [6] encompasses every pos- 
sible state of correlation between x and y, and thus it can 
be used to perform a complete analysis. 



C. Isomorphic Mixed and Behavioural Spaces 



. V [P x]y -P x ]=0 
• V[(xy) - (x)(y)]=0 



y\E x 



E x — E y ] — 0. 



Table U records whether each of the expected relations 
is satisfied for each of the mixed and behavioural spaces 
when they are either unconstrained, or isomorphically 
constrained. As might be expected, the results indicate 
that the weak isomorphisms used to construct the mixed 
and behavioural spaces of game theory are not able to re- 
produce necessarily true results from probability theory. 
Hence, the rational player of game theory is unable to 
reliably reproduce results from probability theory. These 
differences between game theory and probability theory 
need to be resolved. 



The mixed Vm and behavioural Vb strategy spaces 
contain embedded probability spaces where x and y are 
respectively perfectly correlated, independent, or par- 
tially correlated. As previously, we will now perform a 
comparison of probability spaces, both with and without 
isomorphic constraints, for various correlation states be- 
tween the x and y variables. That is, we will compare 
the mixed strategy space Vm and behavioural strategy 
space Vb with isomorphism constrained mixed and be- 
havioural strategy spaces as indicated using the following 
notation. 

The case of perfectly correlated x and y variables is 
modeled by the spaces 

lim p^xVm mixed 

Pm|« 1=1 constrained mixed 

lim( gjr )^(o,i) Vb behavioural 

V~B\( q r )=(o i) constrained behavioural 

In these spaces we expect all of the following to hold: 

• V[P X3/ (0,0) + P xy (l,l)] = o, 

• V[P BW (0,1) + P BW (1,Q)] = 0, 

• V [P x \y(0\0)] =0, 

• V[P xlv (0\l)] =0, 
. V [(x) - (y)} = 
. V [(x) - (xy)} = 
. V [(y) - (xy)] = 

• V[V{x - y)] = V [V(x) + V{y) - 2cov(x, y)] = 

• V [E xy - E x ] = 0. 

Alternately, when x and y are independent, the relevant 
spaces are 

linift-t^ Vm mixed 

Pm I a a constrained mixed , , 

l/3 1= /3 2 ^) 

lim,--^ Vb behavioural 

Vs\ r=q constrained behavioural 

In all these spaces, the probability distributions satisfy 

• V [P Xy - P X Py] = 



IV. OPTIMIZING SIMPLE DECISION TREES 

We now turn to consider how the differences between 
probability theory and game theory influence decision 
tree optimization. We consider the usual two poten- 
tially correlated random variables depicted in Fig. [5] and 
will use both the unconstrained behavioural probability 
space Vb and the isomorphically constrained behavioural 
spaces Vb\ p —„ for every value of the correlation state 
p G [—1,1]. Our goal is to present an optimization prob- 
lem in which a rational player following the rules of game 
theory cannot achieve the payoff outcomes of a player fol- 
lowing the rules of probability theory. We suppose that 
a player gains a payoff by advising a referee of the pa- 
rameters of the decision tree probability space (p, q, r) 
to optimize a given nonlinear random function. The ref- 
eree uses these parameters to determine the value of the 
function and provides a payoff equivalent to this value. 
(If desired, the referee could estimate the probability pa- 
rameters by using indicator functions and observing an 
ensemble average of decision tree outcomes.) 

There are many possible random functions which we 
could use, and some are listed in Table|H We could choose 
any relation of the form / = where probability theory 
shows V/ = and game theory has V/ ^ 0. Therefore 
V/ is effectively a discrepancy vector. We focus on the 
squared magnitude of the length of the discrepancy vec- 
tor and examine functions of the form F = 1 — |V/| 2 . 
Immediately, probability theory will optimize this func- 
tion at the point F = 1 while game theory will locate an 
optimum at F < 1. In particular, we choose 

/ = P ca (0,0)+P CJ/ (l,l) (33) 

so 

F = l-\V[P xy (0,0) + P xy (l,l)]\ 2 

= l-\W[l-q+p{q + r-l)}\ 2 . (34) 

In the unconstrained behavioural space Vb, a rational 
player will evaluate this as 

F = \-{l-q-rf -{l-pf -p 2 . (35) 

In turn, this will be maximized at points p = \ and 
q + r = 1 to give a maximum payoff of P max = \ . 
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A contrasting result is obtained using the isomorphism 
constraints of probability theory where our player faces 
the optimization problem 

maxF = 1 - |V [1 - q + p{q + r - 1)] | 2 

subject to p xy = p, Vp € [—1, 1]. (36) 

Our player might commence by adopting the constraint 
p xy = 1 implemented by (q, r) — (0, 1) to give 



max_F 



1 



1. 



IV [1 



■p(q- 



1)] 



(g,r)=(0,l) 

(37) 



This analysis leads to an optimum point at arbitrary p 
and (q,r) = (0, 1) and a maximum payoff of F max = 1. 
Self-evidently, the player would cease their optimization 
analysis at this point as the achieved maximum can't be 
improved. 

Of course, there are many random functions defined 
over decision trees which produce identical results when 
using or not using isomorphic constraints. We now briefly 
illustrate this using polylinear expected payoff functions, 
and consider optimizing the function 

max(n) = 2(x)+3(y)-4(x)(y). 

subject to p xy = p, V> € [-1, 1] (38) 

over the decision tree of Fig. [5] Of course, simple inspec- 
tion will locate the optimum at ((x), (y)) = (0, 1) giving 
an expected payoff of (II) = 3. However, we step through 
the process for later generalization to strategic games. 

There are an infinite number of correlation constraints 
to be examined, but several are straightforward. When 
the variables are perfectly correlated at p xy — 1 via the 
constraint (q, r) = (0, 1), we have (x) = (y) = (xy) giving 



<n> = (x). 



(39) 



This is optimized by setting (x) = 1 giving an expected 
payoff of (II) = 1. Conversely, when p xy — and x and 
y are independent as occurs when using the constraint 
r = q, then the expectations are separable giving (xy) = 
(x)(y) and 



(n> =2(x) + 3(y)-4(x)(y). 



(40) 



As the (x) and (y) variables are independent, a check 
of internal stationary points and the boundary leads to 
an optimal point at ({x),(y}) — (0,1) and an expected 
payoff of (n> =3. 

More general correlation states require use of, for in- 
stance, standard Lagrangian optimization procedures. 
However, we here adopt a numerical optimization ap- 
proach by first using the correlation constraint to write 
the r variable as a function of p, q and the correlation 
constant p, r = r + (p, q, p) — see Eq. IB1I The constraint 
< r < 1 places limits on the permissible values of (p, q) 
and these are detailed in Eqs. IB2l and [B3I The problem 
is then solved using a a typical Mathematica command 
line of [II 



NMaximize[{inRange[r + (p, q, p)] x 

[2p + 3q- 3pq - pr + (p, q, p)] , 

< p < 1 && < q < 1}, {p, q}]. 



Here, a suitably defined "inRange" function determines 
whether r + is taking permissible values between zero and 
unity allowing the payoff function to be examined over 
the entire (p, q) plane. The resulting optimal expected 
payoffs follow: 



p 




/TT\ 


1 1 


(1., U., I.J 


1. 


+0.75 


(0.8138,0.3876,1.) 


1.03032 


+0.5 


(0.4831,0.5917,1.) 


1.40068 


+0.25 


(0.2590,0.7953,1.) 


2.02693 





(0.,1.,1.) 


3. 


-0.25 


(0.,1., 0.9378) 


3. 


-0.5 


(0.,1., 0.7506) 


3. 


-0.75 


(0.,1., 0.4386) 


3. 


-1 


(o.,i.,o.) 


3. 



(42) 



Some care must be taken to ensure convergence of the 
solutions. This analysis makes it evident that the player 
can maximize expected payoffs by choosing a correlation 
constraint where x and y is independent (say) allowing 
the setting (p, q, r) = (0, 1, 1) to gain a payoff of (n) = 3. 
Other choices would also have been possible. 

We now turn to applying isomorphism constraints to 
the strategic analysis of game theory. 



V. OPTIMIZING A MULTISTAGE GAME TREE 

In this section, we show that the use of isomorphic con- 
straints can alter the outcomes of strategic games even 
when expected payoff functions are being used. As usual, 
we will consider either the behavioural strategy space Vb 
(Eq. |2"6"|) or the isomorphically constrained behavioural 
spaces Vb\„ — p for every value of the correlation state 

peKi].*" 

We consider a strategic interaction between two play- 
ers over multiple stages as depicted in the behavioural 
strategy space of Fig. H3 Here, two players denoted X 
and Y seek to optimize their respective payoffs 

A+ maxn x (x, y) = 3 — 2x — y + Axy 

Y : maxn y (x,y) = 1 + 3a; + y - 2xy. (43) 

We assume that player X chooses the value of x and 
advises this to Y before Y determines the value of y. 

In the unconstrained behavioural strategy space Vb, 
this perfect information game is optimized using back- 
wards induction to give the pure strategy choices (x,y) — 
(0, 1) achieving payoffs of (U x ,H Y ) = (2, 2). 

We now consider the constrained behavioural spaces 
"Pb\ p = P '^P e [ — 1>1]- The two players are non- 
communicating and it is generally not possible to use 
a single value for the correlation p, and this generally 
makes the analysis intractable. However, player Y has 
total control over the setting of the correlation p in three 
cases — when p — ±1 and p = 0. We consider these cases 
now. First consider the space Vb\p x =i in which the vari- 
ables are functionally equal so y — x — xy. In this space 
the players face the respective optimization tasks 



(41) 



X : maxll x (i) = 3 + x 

X 

Y :Il Y (x) = l + 2x. 



(44) 
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As a result, player X optimizes their payoff by setting x = 
1 giving the outcomes (H X ,H Y ) = (4,3). In contrast, 
in the space Vb\p xv =-u the variables are functionally 
related by y = 1 — x and xy — 0. These constraints 
render the optimization tasks as 



X : maxn A (ir) 

X 

Y : U Y (x) 



2 + 2x. 



(45) 



Here, player X chooses x = to optimize their payoff 
leading to the outcomes (II X , H Y ) = (2,2). Finally, when 
player Y chooses to discard all information about the x 
variable, then the variables x and y are independent and 
the chosen space is 7 , bIp xb =o- In this space, there are 
no pure strategy solutions and the players will optimize 
expected payoffs. We have (x) = p and (y) — q and 
(xy) = (x) (y) = pq giving the optimization problem 



X : max(n x ) = 3 - 2p - q + Apq 
p 

Y : max(n y ) = 1 + 3p + q - 2pq. 



(46) 



The best response functions or equivalent partial differ- 
entials are 



X 



Y 



dp 

d(u Y ) 



dq 



= -2 + 4 g 
= 1 -2p 



(47) 



with ex- 



locating the optimal point at (p,q) = (5,3 
pected payoffs of ((U x ), (U Y )) = (§, f ). 

At this stage of the analysis, both players have sep- 
arately calculated an equilibrium point in three spaces 
Pb\p^ v =p for p € { — 1,0,1}, and the selection of these 
correlation states is solely at the discretion of player Y . 
The expected payoffs gained at each of these "local" equi- 
librium points can then be compared to obtain a "global" 
optimal expected payoff. For convenience, these are sum- 
marized here: 



p (qi*),(n y )) 



-1 


+1 



(2,2) 

(5 5) 

(4,3). 



(48) 



Based on these results, player Y will then rationally op- 
timize their expected payoff by choosing to have their 
variables in a state of perfect correlation with p = 1 in 
the space 'Pslp =1. Player X, also being a rational op- 
timizer will play accordingly to give equilibrium payoffs 

f«n*),(iF)) = (4,3). 

As noted above, the more general treatment of a strate- 
gic game, even one as simple as this one, appears in- 
tractable. 



VI. CONCLUSION 

A rational player must compare expected payoffs 
across the mixed strategy space in order to locate equi- 
libria. As expectations are polylinear, such comparisons 
are mathematically equivalent to calculating gradients 



and the issues raised in this paper apply. Further, it 
is perfectly possible that rational player might need to 
calculate the Fisher information defined in terms of gra- 
dients of probability distributions in order to optimize 
payoffs. It is perfectly possible that a rational player 
might need to optimize an Entropy gradient to maxi- 
mize a payoff. It is even possible to define games where 
payoffs depend directly on the gradient of a probability 
distribution — shine light through glass sheets painted by 
players to alter transmission probabilities and make pay- 
offs dependent on the resulting light intensity gradients 
(call it the interior decorating game). This paper has 
shown that rational players working with the standard 
strategy spaces of game theory will have difficulties with 
these games. 

This paper has highlighted two alternate ways to opti- 
mize a multivariate function II(a;, y) where x and y might 
be functionally related in different ways, y — gi(x) for 
different i say. The first approach, common to prob- 
ability theory and general optimization theory, consid- 
ers each potential functional relation as occupying a dis- 
tinct space and approaches the optimization as a choice 
between distinct spaces. Any uncertainty about which 
space to choose does not leak into the properties of any 
individual space. If desired, isomorphic constraints can 
be used to embed all these distinct spaces into a single 
enlarged space for convenience, but if so, all the proper- 
ties of the optimization problem are exactly preserved. 
The second approach, common to game theory, holds 
that the uncertainty about which functional relation to 
choose should appear in the same space as the variables 
(x,y). This is accomplished by expanding the size of 
the space to include both the old variables x and y and 
sufficient new variables (not explicitly shown here) to 
contain all the potential functional relations and allow 
lining. (a;) U(x,y) = U[x,gi(x)] for all i. This enlarged 
space then allows gradient comparisons to be made at 
points n[a;, gi(x)] — H[x, gj(x)] for all i and j to locate 
optima. These two approaches can lead to conflicting 
optimization outcomes as while these approaches gener- 
ally assign the same values to functions at all points, 



n(z, y )| 



y=9i{x) 



lim Tl(x,y), 



(49) 



they typically calculate different gradients at those same 
points 



VU(x,y)\ 



y=9i(x) 



lim VII(a;,2/). 

->g;(x) 



(50) 



These differences can be extreme when the function 
II(a;, y) depends on global properties of the space — the 
dimension, volume, gradient, information or entropy say. 
In its approach, game theory differs from many other 
fields including other fields of economics. For exam- 
ple, the Euler-Lagrange equations of Ramsey-type mod- 
els consider the functional variation of some function 
u[y(x), y'(x)] while ensuring a consistent treatment of the 
function y(x) and its gradient y'{x) [14j. Gradients are 
not taken in limits in these fields. 

Throughout this paper, we have presumed that a ra- 
tional player should be able to use standard techniques 
from either probability theory or optimization theory on 
the one hand, or decision theory and game theory on 
the other, and expect all of these methods to provide 
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consistent results. We have shown that when consider- 
ing multiple, potentially correlated variables, and func- 
tions of these variables dependent on the geometry of 
the probability parameter space, then these methods can 
give rise to contradictory optimization outcomes. We 
have suggested decision and game theory are incomplete 
when they require the adoption of a single geometry for 
any decision or game tree, and that these fields should 
consider applying the alternate geometries of probabil- 
ity theory and optimization theory. Recognizing that a 
single multi-stage decision or game tree can encompass 



an infinite number of incommensurate probability spaces 
might resolve some of the paradoxes of game theory, and 
have broader application. 
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Appendix A: Optimization and isomorphic 
probability spaces 



1. Isomorphic dice 



In each respective die space, the gradient operator is 



dpi 



(Al) 



p n = 1 — J2?=i Pi- For the coin, we have 

V = da db 5 a +b=\ 
Jo Jo 
1 

da 

o 

= 1 

E x = -[alog(o) + (l-o)log(l-o)] 

VE X = -a\og-^—. (A2) 
1 — a 

For the triangle, the equivalent functions are 

V = / da db dc S a+b+c=1 
Jo Jo Jo 

da I db 
io Jo 
1 

2 

E x = -[olog(o) +b\og(b) + 

(1 - a - b) log(l - a - b)} 

VE X = -a log- — — --Slog- — -. (A3) 

1 — a — o 1 — a — o 

Finally, for the square, we have 

V = / da db dc dd 6 a +b+c+d=i 
Jo Jo Jo Jo 

fl />1 — a — b 

da db dc 
lo Jo Jo 
1 

6 

E x = -[alog(a) + b\og{b) + clog(c) + 

(1 — a — b — c) log(l — a — b — c)] 

VE X = —a log- ^— b log 



where a hatted variable pi is a unit vector in the indicated 
direction and we resolve the normalization constraint via 



1 — a — 6 — c 1 — a — 6 — c 

-clog- — . (A4) 

1 — a — o — c 

The function F{a, b, c) has a directed gradient in the 
direction -^(1,-1,0) of 

VF(a,6,c).4=(l,-l,0) = y 2 ilog- (A5) 
V2 2 a 

using Eq. IA4I At points where (a, b, c) = (a, 1 — a, 0) this 
gives a directed gradient of 

VF(a, 1 - a, 0).-L(l-l, 0) = V 2 ~ log (A6) 
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which is optimized at (a, 6, c) = (i, i, 0). 



2. Continuous bivariate Normal spaces 



Expectations of the x and y variables must also satisfy 
certain gradient relations. As expectations integrate over 
the x and y variables, the gradient operator is a function 
of only five variables now, 



Two continuous independent and normally distributed 
random variables x and y with respective means p x and 
p y and standard deviations o x and o y have joint and 
marginal distributions of 



_ d , d „ d „ 9, 9 „ 

V = -r + o M</ + o + a O-j, + -=-p. (A15) 

dp x dp v da x da y dp 



We then have 



1 



xy 



2no x o y 



(■-fa) i iv-fy) 
^ 1 2 



27rcr :i; 
1 



1 (x- Mx )^ 
"2 15 



_ i {y-t^y ) 



V 



2lT<T 



V 



(A7) 



The conditional distribution for x given some value of y 
is 



V[(xy)'-(x)'(y)']\ p=Q = 



(A16) 



lim VKzy)' -(xy(y)'] = p lim l— (xy)' ? 0. 

P^O p^O C*p 



3. Joint probability space optimization 

The gradient operator in the probability space of the 
square dice with probability parameters (a, b, c) is 



Px\y 



1 



1 (g-MxT 
"2 15 



(A8) 



Two random normally distributed variables x and y with 
correlation value p have a joint distribution 



V - a— b— c— 

da db dc ' 



(A17) 



where a hat indicates a unit vector in the indicated di- 
rection. 



P 'x = —]= 

2no x <jy^/\ - p 2 



(A9) 



(x-fi x ) 2 2p(x-n x )(y-ti y ) (y-Hy) 2 



The marginal distributions for the correlated case are 
identical to those of the independent space so P' x = P x 
and P y = P y . The conditional distribution for x given 
some value of y is 



P' 



1 



2(l-p2) 



^ 



1 * ^(W 2 K 

ioncd 

Px + p— (y- fiy). 



where the new conditioned mean is 

O. 

>— 

a. 



Pa 



(A10) 
(All) 



a. Perfectly correlated probability spaces 

We compare calculations when x and y are perfectly 
correlated at points (a, 0,0, 1 — a) in the isomorphically 
constrained space "P C orr and in the non-constrained space 
V . 

' corr 

The joint entropy between x and y is 

E xy (a 7 b,c) = — a log a — 6 log 6 — clogc (A18) 
— (1 — a — b — c) log(l — a — b — c) 

giving respective gradients in the V co „ and T" co „ spaces 
of 



In the enlarged distribution space, the gradient operator 
is 

_ <9 „ d „ d „ d „ 
dy dp x dp y 



dx 
d 
dcr x 



_d_ 

do. 



d 



(A12) 



When suitably constrained by an isomorphism, the en- 
larged distribution satisfies 



^[P'xy-P'xP'yW^ = 



pi _ pi 

x\y x 



p=0 



0. 



(A13) 



Conversely, when the parameter p is not externally con- 
strained then these required relations are not held even 
in the limit as p — > as 

d 

lim V \P' - P'P'] = p lim — P' ^ 
p_+ i * vi r p _,. Qp ^y 



lim V 

p^O 



pi _ pi 

1 x\y 1 x 



= P^lTp 1 **** - (AU) 



xy\b=c=0 



xy 



lim VE. 

(6c)^(00) 



xy 



-a log 

—a log 

-b log 

—clog 
undefined. 



1 - 


a) 








a 








a — 


b- 


c 




b 






1 - 


a — 


b- 


c 




c 






1 - 


a — 


b- 


c 



(A19) 



Equating these gradients to zero locates the maximum at 
(a,6,c) = (§,0,0) in "Pcorr and at (a,b,c) = {\,\,\) in 
V 

' corr 

Writing (a,b,c) = (pi,P2,P3), the Fisher Information 
is a matrix with elements i,j e {1, 2, 3} with 



F — 



(A20) 



^ x v 



dp, 



logP 3 



xy 



dp. 



logF 



xy 
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When isomorphically constrained in the space V C oit, the 
Fisher Information is Fij\b— c —o with the only nonzero 
term being 



(1-a) 
1 

a(l — a) 



a d-a l0g{1 



a TT log a 
aa 



(A21) 



This means that the smaller the Variance the more the in- 
formation obtained about a. In the unconstrained space 
V' COXT , the Fisher Information is a very different, 3x3 
matrix. 

The likelihood function estimates probability param- 
eters from the observation of n trials with n a appear- 
ances of event (x, y) — (0,0), rib appearances of event 
(x, y) = (0,1), n c appearances of event (x, y) — (1,0), 
and rid appearances of event (x, y) = (1,1). We have 
n a + rib + n c + rid = n, giving the Likelihood function 

L = f(n a , n b , n c , n)a n ^b n "c n ^ (1 - a - b - c )»-n--»»»-»»" 

(A22) 

where f(n a ,rib,n c ,n) gives the number of combinations. 
The optimization proceeds by evaluating the gradient of 
the Log Likelihood function. When isomorphically con- 
strained in the space V CO rr, the gradient of the Log Like- 
lihood function is 



VlogL| 6=c=0 = a 



ria 
a 



1 



(A23) 



which equated to zero gives the optimal estimate at a = 
n a /n and rib — ri c — as expected. Conversely, when 
unconstrained in the space V COTT , the gradient of the Log 
Likelihood function evaluates as 



VlogL 



a 



n b - n c 



nb 
b 

He 
C 



1 — a — b — c 
n - n a - n b - n, 



1 — a — b — c 
n — n a — rib — ri r 



1-a-b- 



c 



(A24) 



This is obviously a very different result, though at points 
(a,b,c) = (a, 0,0) equating the log Likelihood to zero 
locates the same estimate as before of a — n a /n and 
rib — n c = 0. 

In the unconstrained probability space V COIT1 the ex- 
pectation, variance, and entropy relations of interest eval- 
uate as 



<z) - (y) 

V(x) - V(y) 



E X y — 



c-b 

(c-b)(a-d) 

- [(a + b) log(a + 6) + 
(1-a- 6) log(l-o-b)] 

— [a log a + b log b + c log c+ 
(1 — a — b — c) log(l — a — b 



which in the limit gives an undefined gradient 
lim V \E XV — E x ] = undefined. 

(ftc)-^(OO) 



(A25) 

c)}, 
(A26) 



b. Independent probability spaces 

For the square die under consideration, we have prob- 
abilities and expectations of 



P xy {00) - P x {0) = ad -be 

(xy) - (x) (y) = ad -be 

, , „ . „ ad — be 

P x{y (0\0) - P x (0) = ——, 



(A27) 



and entropies of 



E x = -(a + b)log(a + b)-(l-a-b)log(l-a-b) 
E y = —(a + c) log(a + c) — (1 — a — c) log(l — a — c) 
E xy = — a log a — 6 log 6 — clogc — dlogd, (A28) 



giving gradients of 



lim V[E xy -E x -E y ] 

ad^-bc 



lim V < a log 

ad^bc 



clog 



d a — ad + be 
a d — ad + be 
d c + ad — be 
c d — ad + be 



b log 



(A29) 
d b + ad — be 



log 



b d — ad + be 

(} ■ ml be , ^ 



Appendix B: Optimizing simple decision trees 

When the correlation (Eq. l30| between x and y is 
p X y — P, and as long as both p ^ and p 1, then the 
correlation constraint defines two surfaces in the (p, q, r) 
simplex at height 

r±(p,q,p) = (Bl) 



p 2 - 2g(l - p)(p 2 - 1) ± pijpi + 4g(l - gQgg 
2[l+p(p 2 -l)] ' 

The function r+(p, q, p) will give the required correlation 
surfaces within the simplex. That is, when p — we 
have r+{p, q, 0) = q as required. Similarly, when p = 1 
we have r + (p, q, 1) > 1 across the entire (p, q) plane with 
the equality r+(p, q, 1) = 1 only where q — or q = 1. 
We require p = 1 at (q,r) = (0,1). Finally, when p = 
— 1 and x and y are perfectly anti-correlated, we have 
r+(p, q, — 1) < across the entire (p, q) plane with the 
equality r+(p, q, —1) = only where q = or q = 1. We 
require p = — 1 at (q, r) = (1, 0). 

The strict requirement that < r + (p,q,p) < 1 es- 
tablishes permissible regions on the (p, q) plane. For 
< p < 1, the permissible region is bounded by the 
q = line and the line 



lip, p) = 



P 



P- 



(B2) 



i-P 2 



Similarly, for — 1 < p < 0, the (p, q) region is bounded by 
the q = 1 line and the line 



q(p, P) = 



1 



(B3) 



